Abstract
The International Prognostic Index (IPI) remains widely used for risk stratification in diffuse large B-cell lymphoma (DLBCL). However, its predictive accuracy is suboptimal in the era of immunochemotherapy. Leveraging individual-level data from the phase III GOYA trial (NCT01287741), we developed and validated machine learning (ML) models integrating clinical, laboratory, imaging, and immunohistochemical variables to predict 1-, 2-, and 3-year overall survival (OS) in newly diagnosed DLBCL patients.
We analyzed 1,166 patients with complete OS data from the GOYA trial (2011–2014), a global phase III study comparing G-CHOP versus R-CHOP in untreated DLBCL, accessed via the Vivli platform. OS (204 events) was defined as time from randomization to death from any cause.
Baseline features included 15 variables eccompassing clinical factors (e.g., ECOG, age), laboratory markers (e.g., LDH, albumin, lymphocyte and monocyte counts), imaging-based tumor burden including bulky disease (maximum diameter >7.5 cm) and SPD (sum of the product of perpendicular diameters), and immunohistochemical markers (e.g., BCL2, COO). Patients were randomly split into training and test sets (70:30) stratified by OS status. Continuous variables were standardized. Missing values <1% were imputed using median/mode; COO (20.8%) and BCL2 (35.2%) were imputed using multiple imputation with missingness indicators.
We trained Cox proportional hazards, random survival forest (RSF), and XGBoost models. Hyperparameters were tuned via 5-fold cross-validation for RSF and XGBoost.
Model performance was assessed via 5-fold cross-validation (Cox/XGBoost) or out-of-bag estimation (RSF) in the training dataset, and evaluated using Harrell's concordance index (C-index) and time-dependent area under the ROC curve (AUC).
Risk stratification was performed using the cumulative hazard function (CHF) predicted by the RSF model. 1000 bootstrap samples were created and used to calculate cutoff values for high, medium and low risk groups. Kaplan-Meier curve was created and log-rank test was applied to evaluate the goodness of risk stratification.
In the test set, the RSF model achieved the best performance with AUCs of 0.78, 0.71, and 0.72 for 1-, 2-, and 3-year OS prediction (C-index: 0.71), followed by XGBoost (AUCs: 0.75, 0.70, 0.71; C-index: 0.69) and Cox (AUCs: 0.69, 0.67, 0.68; C-index: 0.66). All three models outperformed the classical IPI model (AUCs: 0.71, 0.67, 0.66; C-index: 0.65). SHAP and variable importance analyses consistently identified albumin, LDH, SPD, and age as the top predictors. RSF-based CHF stratified patients into low (<0.226), medium (0.226–0.330), and high (>0.330) risk groups, with distinct 36-month OS rates: 98.7%(95% Confidence Interval(CI):0.97-0.99), 82.4%(95% CI:0.77-0.89), and 54.3%(95% CI:0.48-0.61) (log-rank p<0.001).
ML-based models, particularly RSF, improved OS prediction and risk stratification beyond IPI and Cox models in newly diagnosed DLBCL. Albumin, LDH, SPD, and age emerged as key prognostic variables. These findings support the integration of ML approaches in individualized treatment planning for DLBCL patients.
This feature is available to Subscribers Only
Sign In or Create an Account Close Modal